R / QGIS Map-Off - When the Data Scientist Meets the Vizulizer

Kristian Lunow Nielsen

2018-10-27

Introduction

The aim of this GIS project is to investigate the differences in using R and QGIS as the chosen GIS tool. The data used in this project is timeseries data on median income for tax payers in United Kingdom, from 1999 to 2014. The available data covers individual boroughs in London as well as regional and national areas in United Kingdom. The conclusion of this project is that each program has its advantage, and so in respect to the question about which one is the better, it depends on the goal of the task. R does a better job on the Data Science related tasks, but QGIS makes it easier to handle the visualization part of the job.

This short project is structured such that it should be clear to see dis-/advantages of each program. Below follows a short description of the data, both the raw and processed data that at the end is visualised. The project rounds of with an in-depth description of the dis-/advantages of both programs.

This GIS project intents to visualise the differences in growth multiples of median earnings between boroughs in London. While it causes divergence peoples living standards, it is also a drag on each individual borough’s propensity to invest in innovation. According to some economists, innovation is the main driver of future growth for economies on advanced growth paths. Furthermore, innovation can take economies from one growth path to another, and therefore should investing in innovation be a priority for all societies.

The first thing we do is to read in the raw data that contains the median earnings for each borough in London, as well as regional and national regions in United Kingdom.

Fig. 1 - Snippet of the raw data

## # A tibble: 5 x 7
##   X__1    X__2          `1999-00`       X__3   X__4   `2000-01`      X__5 
##   <chr>   <chr>         <chr>           <chr>  <chr>  <chr>          <chr>
## 1 Code    Area          Number_of_Indi~ Mean_£ Media~ Number_of_Ind~ Mean~
## 2 E09000~ City of Lond~ 10000           109800 40400  10000          1370~
## 3 E09000~ Barking and ~ 62000           16200  15100  71000          18100
## 4 E09000~ Barnet        161000          26800  18700  156000         30800
## 5 E09000~ Bexley        105000          20500  17200  116000         19800

Above can be seen a snippet of the raw data. This is the data from where the average annualized growth multiple is being calculated. It should be obvious from the above snippet, that the data isn’t as clean as one could wish, and more important, it is not at all ready to be loaded in to QGIS.

A growth multiple for each year, for each borough, is calculated and can be seen from the snippet below.

Fig. 2 - Snippet of the processed data

##        Code                 Area     2000      2001      2002     2003
## 2 E09000001       City of London 1.608911 0.5892308 0.9660574 1.059459
## 3 E09000002 Barking and Dagenham 1.033113 1.0384615 1.0493827 1.023529
## 4 E09000003               Barnet 1.037433 0.9639175 1.0534759 1.015228
## 5 E09000004               Bexley 1.005814 1.0578035 0.9726776 0.994382
## 6 E09000005                Brent 1.018182 0.9821429 0.9939394 1.024390

In this project, we are only interested in the average annualized growth multiple, and it is the one that we plot in the map. The average growth multiple for each borough is visualized on the map below.

Fig. 3 - Map created in R

Its a decent map but the creative steam has a harder time flowing

Its a decent map but the creative steam has a harder time flowing

Fig. 4 - Map created in QGIS

The map is more elegant and readable, and it is easier to include add-ons as the national map in the right corner.

The map is more elegant and readable, and it is easier to include add-ons as the national map in the right corner.

Below is the coming description of the dis-/advantages of each program.